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This paper extends Eigen's quasispecies equations to account for the semiconservative nature of 
DNA replication. We solve the equations in the limit of infinite sequence length for the simplest 
case of a static, sharply peaked fitness landscape. We show that the error catastrophe occurs 
when (i, the product of sequence length and per base pair mismatch probability, exceeds 2 In 1+ i/ fc , 
where k > 1 is the first order growth rate constant of the viable "master" sequence (with all 
other sequences having a first-order growth rate constant of 1). This is in contrast to the result of 
lnfc for conservative replication. In particular, as fc — > oo, the error catastrophe is never reached 
for conservative replication, while for semiconservative replication the critical [i approaches 2 In 2. 
Semiconservative replication is therefore considerably less robust than conservative replication to the 
effect of replication errors. We also show that the mean equilibrium fitness of a semiconservatively 
replicating system is given by fc(2e _M ' 2 — 1) below the error catastrophe, in contrast to the standard 
result of fce _M for conservative replication (derived by Kimura and Maruyama in 1966). 
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I. INTRODUCTION 

In 1971, Manfred Eigen introduced the quasispecies 
formulation of molecular evolution to explain the ob- 
served distribution of genotypes in RNA evolution ex- 
periments |l|,|2|. The central result of his model was that 
due to mutations, the equilibrium distribution of geno- 
types did not consist of a fittest sequence, but rather a 
set of closely related strains, which Eigen termed a "qua- 
sispecies." Eigen showed that a stable quasispecies only 
exists if the mutation rate is kept below a threshold value. 
Above this value, the distribution of genotypes undergoes 
a second-order phase transition termed the error catas- 
trophe, in which the distribution completely delocalizes 
over the gene sequence space. Subsequent studies on the 
quasispecies model have focused almost exclusively on 
the error catastrophe [E 0, IE IE 0, IE IE El , though there 
has also been some work on the dynamical aspects of 
the equations 0, 0] . More recently, other phase tran- 
sitions besides the error catastrophe (e.g. the so-called 
"repair catastrophe" ) have been shown to arise from the 
quasispecies equations 

MM- 

A common feature of previous work on the quasis- 
pecies equations has been the implicit assumption that 
the genome of an organism could be written as a linear 
symbol sequence, and that replication occurs conserva- 
tively (that is, the original genetic material is preserved 
during replication). These two assumptions allow for a 
relatively straightforward derivation of a system of equa- 
tions modelling the evolution of a unicellular, asexual 
population. In the simplest formulation, we assume that 
each organism has a genome a = S1S2 . . . Sl of length L, 
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where each "letter" or "base" Si is drawn from an alpha- 
bet of size S (= 4 for all known terrestrial life). We as- 
sume first-order growth kinetics, and that the genome de- 
termines the first-order growth rate constant, or fitness, 
denoted by k ct (in general, n a will be time-dependent, 
reflecting the generally dynamic nature of the environ- 
ment). Furthermore, we assume a per base replication 
error probability of e CT . If we let x a denote the fraction 
of organisms with genome a, then it may be shown that 
0,0, 

= ^2 K m{o-', <j)X a i - R{t)x a (1) 
a' 

where R(t) = n a x a is simply the mean fitness of the 
population, and K m (a' , a) is the first-order mutation rate 
constant for mutations from a' to a. If p m {a' , a) denotes 
the probability of mutation from a' to a, then it is clear 
that K m (cr',(T) = K a ip m (a' ,0). If we let HD(o',a) de- 
note the Hamming distance between a' and cr, then it is 
possible to show that, 

p m (a', a) = (^^'^(l - £al) L-HD(„',« } (2) 

The simplest formulation of these equations considers a 
genome- independent replication error probability e, and 
a time-independent fitness landscape characterized by a 
single "master" sequence gq of fitness k > 1, with all 
other sequences set to a fitness of 1. This so-called Sin- 
gle Fitness Peak model (SFP) has been the subject of 
considerable theoretical treatment 

IE H H (and refer- 
ences therein). The central result of this model is that, 
in the limit of L — > 00, the mean equilibrium fitness of 
the population is given by fce~ M for /i < lnfc, and 1 for 
/i > In fc, where /i = Le. When /i < In k, the population 
is localized in a cluster about the master sequence, result- 
ing in what Eigen called a quasispecies. When /1 > lnfc, 
the population is completely delocalized over the gene se- 
quence space, so that no discernible quasispecies exists. 



The transition between the two regimes, at fi C rit = m k, is 
known as the error catastrophe. It should be noted that 
the result of ke~^ was first derived in 1966 by Kimura 
and Maruyama |l5j . and is a standard result in theoret- 
ical population genetics. 

While the assumption of a linear symbol sequence and 
conservative replication is correct for modelling single- 
stranded RNA, a proper extension of the quasispecies 
model to real organisms should take into account the 
double-stranded nature of DNA, and also the semicon- 
servative nature of DNA replication. In semiconserva- 
tive replication, the original DNA molecule is not pre- 
served after replication. Rather, each strand serves as 
the template for the synthesis of a complementary daugh- 
ter strand, meaning that after replication, each DNA 
molecule consists of one parent and one daughter strand 

The formulation of the quasispecies equations given 
above are inadequate to describe evolution with double- 
stranded, semiconservatively replicated genomes. There 
are two reasons for this: First of all, because DNA 
is double-stranded, there is no well-defined Hamming 
distance between two DNA molecules. Furthermore, 
because in semiconservative replication the original 
molecule is destroyed, a mathematical formulation of this 
process must incorporate an effective death term, which 
is clearly lacking in the quasispecies equations for con- 
servative replication. 

The goal of this paper is to extend Eigen's formula- 
tion of the quasispecies equations, to account for the 
double-stranded and semiconservative nature of DNA 
replication. This is a necessary first step toward making 
the quasispecies equations a quantitative tool for analyz- 
ing the evolutionary dynamics of unicellular organisms. 
Then, after obtaining the form of Eigen's equations for 
the case of double-stranded DNA, we wish to proceed and 
solve these equations for the simplest landscape, that of 
the static single fitness peak. 

This paper is organized as follows: In the following 
section, we present an overview of DNA sequence anal- 
ysis and replication mechanism, followed by a deriva- 
tion of the appropriate quasispecies equations. We con- 
tinue in Section III with a discussion of the single fitness 
peak model. Specifically, we present the infinite sequence 
length equations, leaving the details of the derivation, 
which are fairly involved, for Appendix A. We then go on 
to discuss the error catastrophe, presenting both analyt- 
ical results and numerical corroboration using stochastic 
simulations of replicating populations. In Section IV, we 
discuss our results, and also the extension of our equa- 
tions to multiple gene models. Finally, we conclude in 
Section V with a summary of our results, and a discus- 
sion of future research plans. 
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FIG. 1: The anti-parallel nature of double-stranded DNA and 
RNA 



II. DERIVATION OF THE QUASISPECIES 
EQUATIONS FOR SEMICONSERVATIVE 
REPLICATION 

A. An Overview of DNA Sequence Analysis 

Double-stranded DNA consists of two anti-parallel, 
complementary strands. During transcription, messen- 
ger RNA (mRNA) is synthesized in the 5' to 3' direction. 
The DNA template strand from which RNA synthesis oc- 
curs is known as the anti-sense strand, and is read in the 
3' to 5' direction. The complementary strand, the sense 
strand, has the same sequence as the transcribed mRNA, 
and is "read" in the 5' to 3' direction (the quotes are to 
indicate that the sense strand does not directly partic- 
ipate in the transcription process). We therefore adopt 
the convention that DNA and RNA sequences are read 
in the 5' to 3' direction, as illustrated in Figure 1. How- 
ever, this convention is arbitrary, and it is equally valid to 
read DNA and RNA sequences in the 3' to 5' directions. 
Once a convention is adopted, the anti-parallel nature of 
double-stranded DNA (or RNA) means that the comple- 
mentary strands are read in opposite directions. A more 
detailed explanation can be found in [16|. 

We consider a double-stranded DNA molecule with 
generalized alphabet of size 25, consisting of "letters" 
0, 1, . . . , 25 — 1. Each "letter" i is assumed to uniquely 
base pair with (i + 5) mod 25. For actual DNA, we of 
course have 5 = 2, and we may make the assignment 
A -» 0, G -> 1, T -» 2, C -> 3. 

Given a DNA molecule of sequence length L, let one of 
the strands be denoted by a — b\ ... 6l ■ If the comple- 
ment of a base bi is denoted by bi, then the complemen- 
tary strand is given by a = ■ ■ ■ b%. Note that a = a, 
and therefore, each DNA molecule may be denoted by 
the set {a, a} = {a, a}. 

For single-stranded molecules of length L and alphabet 
size 25, there are (25) L distinct sequences. We seek to 
derive the analogous formula for double-stranded DNA. 
Given a DNA molecule {a, a}, define the internal Ham- 
ming distance lj = HD(<t,<j). If we let Ni(li;L) denote 
the number of DNA molecules of length L with internal 
Hamming distance Zj, then the total number of distinct 



3 



sequences is simply given by Ntot — J2i I= o L). We 

therefore proceed to compute Nj(lj; L). Due to the pos- 
sibility of palindromic molecules (cr = a), we need to 
consider the case of L even and L odd separately. 

Given some DNA molecule {cr, cr}, with a = b\ ... b^, 
suppose we have bi = for some i. Then bi — 

bL-i+i, and hence equality between corresponding bases 
in a and a comes in pairs whenever i ^ L — i + 1. This 
must always be true, since, Hi = L — i + 1, then bi — 
bh-i+i =^ b% = bi, which is impossible. Therefore, cr and 
a must be equal at an even number of sites, hence li must 
be odd for odd L and even for even L. 

Suppose L is odd, so L = 21 + 1, and consider some li = 
2k + 1. We have complete freedom to choose b\, . . . , 
We automatically have ^ Thus, we have 

I — k remaining sites among b\ , . . . , bi where we choose 
bh-i+i such that = bi. Equivalently, we have k 

sites among b\, . . . ,bi where we choose bi,~i+i such that 
b~L-i+\ 7^ bi. There are ( l k ) ways of choosing these sites, 
and for each such choice, there are 2S — 1 possible values 
for each 6l-i+i taken to be distinct from bi. Putting 
together all the degeneracies, we obtain Q (2S) l+1 {2S - 
l) k ways of choosing a such that HD(a, a) = 2k + 1 = lj. 
However, this still does not give us the set of all distinct 
DNA molecules {cr, cr} with internal Hamming distance 
li, for if a ^ cr, then our counting method generates a 
given {a, a} twice, by generating both a and a. Since 
cr = a if and only if lj — 0, which is impossible for odd 
L, we have, finally, that, 



for odd L. Thus, for odd L, 

l 



{2S) l+1 {2S- 1) A 



(3) 



N tot = k2S) l+1 (l) ( 2 S - 1)"' = klS) L (4) 



k=0 



If L is even, then we may write L = 21. In this case, 
lj is also even, and so lj — 2k for some k = 0, . . . , I. We 
have complete freedom to choose b\ , . . . , bi . Proceeding 
as with the analysis above, we may show that there are 
(^)(2S')'(25-l) fc ways of choosing a so that HD(a,a) = 
h. If 1 1 ^ 0, we need to divide by 2 to get the set of all 
distinct DNA molecules with internal Hamming distance 
li. Therefore, 
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l) k for k ^ 
for k = 



for even L. Therefore, for even L, 

N tot = l -{2S) L + \{2SY- 



(5) 



(6) 



B. Modelling DNA Replication 

The replication of DNA during cell division may be 
divided into three stages, which are illustrated in Figure 
2. 

The first stage of DNA replication is strand separation, 
with each parent strand serving as a template for synthe- 
sizing the complementary daughter strands ^(| . We may 
model this stage by writing that a given DNA molecule 
{a, a} separates into the single-stranded sequences a and 
a. 

As strand separation occurs, daughter strand synthe- 
sis is catalyzed via enzymes known as DNA replicases. 
However, due to errors in the base pairing process, a is 
not necessarily paired with a. Rather, once cell division 
is finished, the original a is paired with some a' , and 
similarly for a. 

Each genome {a, a} has a characteristic replication 
mismatch probability ^{ a ,a} (a base-pair-independent 
mismatch probability is certainly a simplification, but it 
is an initial starting point). Different genomes may have 
different replication fidelities, due to various replication 
error correction mechanisms which may or may not be 
functioning. For example, in Escherichia coli, the DNA 
replicase Pol III has a built-in proofreading mechanism 
which excises mismatched bases in the daughter strand 
|16|. In addition, in many prokaryotes and eukaryotes, 
DNA daughter strand synthesis is followed by mismatch 
repair 0] , which can distinguish between the parent and 
daughter strands, thereby allowing the proper repair of 
mismatches. All such repair mechanisms are gathered 
within £{ a ,s-} m our model. 

In the final stage, DNA replication and cell division 
is complete, and the parent and daughter strands have 
become indistinguishable. Remaining mismatches in the 
daughter cells' DNA are eliminated by various mainte- 
nance enzymes, which recognize and repair the lesions 
caused by mismatched base pairs. However, because it is 
impossible to determine which strand has the incorrect 
base, the mismatch is correctly repaired with probabil- 
ity 1/2. The result is that the a, a' pair is converted 
to some a", a", giving the DNA molecule {a", a"}. A 
similar process happens for the parent a strand. 

We wish to derive the probability that a given par- 
ent strand a produces {a" ,<r"} in the daughter cell. 
Let us denote this probability by p(a,{a" Also, 
let p(a,a') denote the probability that a is paired 
with a' during daughter strand synthesis, and let 
p((<7,a'), (a", a")) be the probability that a — » cr", a' — > 
a" during post-replicative lesion repair. Then we have, 
assuming a" ^ a", that, 



p(a,{a",a"}) = J>(^') 



Note the additional ^(2S)z term arising from the con- 
tribution of the palindromic sequences. 



[p((a,a'),(a",a"))+p((a,a'),(a",a"))] 
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FIG. 2: The three stages of DNA replication 
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If cr" = cr", then only one of the sums is kept. 

We now proceed to compute 



E^pfo ("".*"))■ Write a = 
a' = foi...6' L , and cr" = 6i" ...6 L "- Let J = HD(a,a"). 
Let us consider some i for which bi — bi" . Then 6^ 
can take on any value, for if b\ = 6,", then no repair 
is necessary, and we obtain 6, — ► (b" ,b"). If 6^ 7^ 
then repair is necessary, and with probability 1/2 it 
b\ that is repaired to giving once again that 
— ► (£>»", £>i"). So, let us now consider some i for which 
7^ 6j". Then b\ must be equal to bi" . Otherwise, if 
= bi ^ bi" , then no lesion repair occurs, and we get 
— > (6i,6j) 7^ If 6- 7^ 6j, then 6- is repaired 

with probability 1/2 to bi, or 6j is repaired with proba- 
bility 1/2 to 74. Thus, either 6, -> (&;,&;) 7^ (&»",&,"), or 
6, — > (bj, 6J) 7^ (6i" , and so the corresponding cr' docs 
not contribute to the sum £ CT , p(a, <r r )p((a, cr'), (cr", a")), 
since p((cr, cr'), = 0. 



Our analysis allows us to perform the sum, assuming 
a probability ^{ a .s} of a mismatch. For a given cr', let ?' 
denote the number of sites among the L — I sites which 
are equal in cr and cr" for which 6- 7^ There are 



( ,, )(25 f — 1)' ways of choosing such a cr'. The probabil- 
ity p(a, cr') is equal to ( — 
probability p((a,a') 



L , 9 _l)l'+l(l_ e The 

ct",ct")) is then (5)''+', so multi- 
plying by the degeneracy in I' and summing over all I' 
gives, 



£M^>((^')>'>")) = £(% Z )(25-l)''(|^)^(l-e {(7i , } )^'(i)^ 



If we define T=HD(a,a"), then we obtain that p(cr, cr>((cr, cr'), (a" , cr")) = - ^l) L ~l Now, note 

that HD(b 1 ...b L ,b[...b' L ) = HD_(b L ...b 1 ,b' L ...b' 1 ), and HD(h . . .b L ,b' L . . .b[) = HD(b L . . .b u b[ . . .b' L ), so that 
I = HD(a,<j") = HD(a 7 a"), and I = HD(cr,a") = HD(a,a"). Therefore, we obtain that, 



I (^S-T^) (1 — li^±) L -' for cr" = a" 

r 



C. The Quasispecies Equations 



We are now ready to derive the quasispecies equations 
for semiconservative replication. We consider a pop- 
ulation of unicellular, asexually replicating organisms. 
Let n{a,s} denote the number of organisms with genome 
{cr, a}. We let H{ a . S ] denote the first-order growth rate 
constant of organisms with genome {a, a}. Then from 
the replication mechanism illustrated in Figure 2, we ob- 



tain the system of differential equations given by, 



dt 



— - K {(T,<T}^{<7,Cr} 



2_, K {a',5>} n {a>,5>} X 
{*>,*>} 

(p(a',{a,a})+p(a',{a,a})) (10) 

The first term is a death term which takes into account 
the destruction of the original genome during replication. 
The terms in the summation take into the account the 
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production of {<j, a} from both a' and a'. 

We now define n = J2{a,a} n W^}^ and X W,*} = 
n {<j,a}/ n - Reexpressed in terms of the population frac- 
tions X{ a ^}, the dynamical equations become, 

dt - 2^> K {<t',c'} X {<t',<t'} x 

{*>,*>} 

(p(a',{a,a})+p(a',{a, a})) 

-(«{<r,ff} +Ht))x{a,*} (11) 

where R(t) = so is simply the mean 
I 



fitness of the population, which arises as a normalization 
term to ensure that the total population fraction remains 
1. 

We now proceed to put these equations into a form 
which is more easily amenable to analysis than the above 
equations. To this end, we make the following definitions: 
(1) n a = K{ a ,g}, so that k s = K a . (2) e a = so that 

e CT = e s . Finally, y a = \x{ a ^ if a ^ a, and y a = X{ a .gy 
if a = a. Clearly, we also have that y a = y s . 

Now, 



E K W,a'}X{ a ',g'}{p{cr',{a,a})+p{<j',{a,a})) = E K{<T',a>}X{a>,a>}(p((7', {&,(?}) + p(a' , {a, a})) 

W,<?'} {cr',CT'},cr'#CT' 

+ E k {<j' ,s'}X{ a > M>}{p{cr' , {c, ct}) +p(a', {a, a})) 

{a',a'},cr'=a' 

E n a ,x {cJ , t g, } p((j',{(j,d}) 
+ E ^s'X {<y ^g, } p{d' ,{a,a}) 

+ Y 2K v' x {a>,a>}P(v',W,v}) 

= Y K v> X {a>M>}P(v',W,v}) +2 Y K ^' X {cr'.S'}P(< J 'A (J ^}) 

(12) 

Therefore, for a ^ a, 

a' .<j' ^<j ! 
a' ,a' = a' 

-{n a + R{t))y<j 
-{n a + k(*))j/<t 

£ ct'/2 \HD(<T,cr')ri e <?' \L-HD(a.cr') , I e S' /2 



er' a' 

-{n a + R(t))y a 

2E^'^'(^)^' <7 ' ) (1 - ^f)™^ - («„ + «(«))„„ (13) 



For <T = (Twe get, 

dy a dx {a g } / / c -i \ j 

2 E K ^'^'-P( (7 ''{ cr '^}) - + R { t ))v>y 



a' a' —a' 
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= 2£w(^) ffD(CTV) (l - («„ + (14) 

cr' 



Since we obtain the same set of equations for palin- 
dromic and non-palindromic molecules, the final form of 
our quasispecies equations becomes, 

a' 

-{n a + R(t))y a (15) 

It is readily shown that R(t) — n a y a . It is also read- 
ily shown that y a = y s for all a implies that dy^/dt = 
dy s /dt for all cr, and so y a — ys is preserved by the 
evolution. 



III. THE SINGLE FITNESS PEAK 
A. Overview and Analytical Results 

In the single fitness peak model, there exists a unique, 
master genome {<To, cto} with fitness k > 1, with all other 
genomes having fitness 1. Our fitness landscape is there- 
fore given by K ao — n S(j = k, while k„ = 1 for a ^ ao, &o- 
We also assume that e CT is independent of cr, so that 
e a = e. For this landscape, we wish to obtain the equi- 
librium behavior of the system of differential equations 
given by Eq. (15). 

For the case of conservative replication, the single 
fitness peak model may be solved by first grouping 
the genomes into Hamming classes 3J. Specifically, 
given the master sequence Co, we may define HC(l) = 
{<t\HD(<t 7 <jo) = I}- If x a denotes the population frac- 
tion with genome cr, then we define zi — J2aeHC(i) x °- 
The quasispecies equations are then reexpressed in terms 
of the zi, and the equilibrium equations may be readily 
solved in the limit of infinite sequence length, since the 
backmutation terms become negligible. The result is, 

dzi 1 1 

-4 = e ~ M X) n^-'i/^-'i ( 16 ) 

h=0 l ' 

where «/ = k for I = 0, and 1 for I > 0, R(t) = (k— 1)zq + 
1, and fi = Le in the limit L — > oo. 

For the case of semiconservative replication, the sin- 
gle fitness peak model for double-stranded genomes be- 
comes an effectively two fitness peak model. Thus, it is 
not possible to directly group the genomes into Hamming 
classes. Nevertheless, the single fitness peak for double- 
stranded genomes is solvable. The details of the solution, 
which are fairly involved, may be found in Appendix A. 
The final result, however, is simple to understand. In 
the limit of infinite sequence length, cro an d cro become 
infinitely separated. Therefore, locally around ctq, ao we 
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have an effectively single fitness peak model. We may 
therefore exploit the local symmetry of the landscape 
and define Hamming classes around cr and cr . Thus, 
HC(ao;l) = {(t\HD(<t, cro) = I}, an d similarly for cro. 
We may then define wi = J2<reHC(<r ;l) and ^ ma y 
be defined similarly with respect to ao- However, by sym- 
metry of the landscape we have wi = wi, and so need only 
consider the dynamics of the wi . In Appendix A, we show 
that when expressed in terms of the wi , the quasispecies 
equations become, 

= 2e ^ /2 E h { 2 - (* + «(*))«* 

h=0 

(17) 

In this case, n{t) = 2(k — 1)wq + 1. The reason for this 
is that w is only the fraction of the population on the 
fitness peak at cro- By the way we defined our y CT , the 
total fraction of viable organisms is given by wq + wo — 
2w . 

We begin the solution of the infinite sequence length 
equations by solving for the equilibrium value of wq. We 
have, 

^ = 2ke-»/ 2 w -{k + R(t))w (18) 

which admits the solutions wq — 0, fc ^ 2e 2( - fc / _~^~ 1 . Multi- 
plying by 2, we get the equilibrium solution for xi ao ^ \ 

of or fc< - 2e fc _ 1 ~ 1 ' > ~ 1 . To determine the domain of va- 
lidity of these solutions, we note that we want wq = 1/2 
for fi = 0. That is, when replication is perfect, then the 
population resides entirely on the fitness peak {cro,^}. 
We must also have wq > 0, which holds as long as 
fc(2 e -M/2 _l)_l>0=*/i< [i crlt = 21n TT 2 - 7I . There- 
fore, by continuity, we have that for /i < Hcrit, the equi- 
hbrium solution is w = — — 2 (k-i) — ' f 1 > Vcrit, 
the equilibrium solution becomes wq = 0. The transition 
between these two solutions regimes is known as the error 
catastrophe. 

In dealing with conservative replication, another pa- 
rameter of interest which we consider is the localization 
length, defined as (I) — YaZi ^ z ii where zi denotes the 
population fraction at Hamming distance / from the mas- 
ter sequence. We wish to extend the definition of localiza- 
tion length to our model. The complication here is that 
in the limit of infinite sequence length, the Hamming dis- 
tances I and / to cro and <7o (respectively) cannot be simul- 
taneously finite. However, as mentioned previously, the 
fraction of the population at a Hamming distance I from 
cro, given by wi, is equal to the fraction of the population 
at a Hamming distance / from ao, given by u>;. Therefore, 
an appropriate definition for the Hamming distance is to 
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Parameter 

f-l'crit 

■^master flcrit ) 

R(t = OO) (fl < flcrit) 
(I) (fl < flcrit) 



Conservative Semiconservative 



lnfc 

fce~^-l 
fc-1 



2 In- 



1+1/* 

fc(2e~^ 2 -l)-l 



/' 



fc(2e 

fc(2e 



-M/2 _ 



fc(2e-' J / 2 -l)-l 



- Conservative 

- Semiconservative 



TABLE I: Comparison of quasispecies equilibrium between 
conservative and semiconservative replication. It should be 
noted that R(t — oo) is simply the equilibrium mean fitness 
of the population. 





FIG. 4: Plot of «(t = oo) versus fi for k = 10. 
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FIG. 3: Plot of fi C rit versus k for both conservative and semi- 
conservative replication. 



define (I) = Y^tLi ^ w l- We may compute (I) by using a 
technique similar to the one developed in |14j . Briefly, a 
differential equation for the time evolution of (l) is de- 
rived from the evolution equations for the wi . The result 
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FIG. 5: Plot of (I) versus fx for fe = 10. 



Numerical Verification Using Stochastic 
Simulations 



Note that the localization length is finite for fi < f\i C riu 
but diverges at the error catastrophe. 

For convenience, Table I illustrates the difference be- 
tween conservative and semiconservative replication. 

Figure 3 shows a plot of fi cr it versus k for both the 
conservative and semiconservative cases. Figure 4 shows 
a plot of R(t = oo) versus fj, for k = 10 for both the 
conservative and semiconservative cases. Finally, Figure 
5 shows a plot of (I) versus fi for k = 10 for both the 
conservative and semiconservative cases. 



In order to allow for independent verification of the 
semiconservative error catastrophe, we employ stochastic 
simulations of evolving organisms rather than numerical 
integration of the equations themselves. The details of 
these simulations are given in Appendix B. 

Our simulations involve constant population sizes of 
10 4 organisms with genome lengths of 101 base pairs, 
using an alphabet size 2S = 4 to correspond with the al- 
phabet size of DNA. The master sequence on our SFP 
landscape replicates at each time step with probabil- 
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FIG. 6: Error catastrophe in a stochastic simulation of a finite 
population of semiconservatively replicating organisms. 



ity Pr, {(To, o-o } — 1; a U other sequences replicate with 
PR,{a,9} 5^{cto,cto} = 0.01. Transforming these replication 
probabilities to replication rates for use in the above 
equations, we have K{<j , So } = 100 given K{<,.B}^{a .5o} = 
1. 

We run our simulations using the above parameters. 
For those simulations in which the equilibrium value of 
^master > 0, we calculate the equilibrium fraction of 
viable organisms as the average of 100 equilibrium steps 
from 10 independent runs. For those simulations in which 
the equilibrium value of ^master = 0, we verify this be- 
havior in two independent runs. The equilibrium value 
of a; m aster ^ s calculated as above for various values of e, 
and the results are displayed in Figure 6. The predicted 
tcrit for the above parameters is indicated in Figure 6 
as a dashed line. Note the good agreement between the 
theoretical prediction of the error catastrophe and the 
numerical results. 



IV. DISCUSSION 

The key difference between conservative replication 
and semiconservative replication is the destruction of the 
parent genome in the semiconservative case, as opposed 
to its preservation in the conservative case. This is cap- 
tured by the functions versus 2e _AI / 2 — 1 in the formu- 
las given in Table 1. For conservative replication, is 
simply the probability of correct replication. This proba- 
bility is always positive, and so, by making k sufficiently 
large, it is possible to guarantee that the effective growth 
rate ke~^ of the master sequence stays above the growth 
rate of 1 for the unviable sequences. For semiconservative 
replication, the probability that each strand is matched 



with its proper complementary strand is e"^/ 2 . There- 
fore, since there are two parent strands, and the par- 
ent genome is destroyed during replication, we have the 
factor 2e~^/ 2 — 1, yielding an effective growth rate of 
k{2e~^/ 2 — 1). However, 2e _M / 2 — 1 is only positive when 
e -M/ 2 > 1/2, or when \i < 2 In 2. When the probabil- 
ity of correct daughter strand synthesis drops below 1/2, 
then the rate of production of viable genomes no longer 
exceeds the rate of destruction. The result is that repli- 
cating faster simply increases the rate of destruction of 
viable organisms, and therefore does not avoid the error 
catastrophe. 



The semiconservative quasispecies formalism may be 
naturally extended to more sophisticated models with 
more than one gene. In this paper, we focused on the 
10° single fitness peak model, in which the genome consists 
of a single, "viability gene," and the replication error 
probability is genome- independent. 

As an example, we may incorporate mismatch repair 
into the semiconservative, qua sispecies formalism. As 
with the conservative case [13|, H3, we consider a two 
gene model, in which one gene codes for viability, and 
the other codes for repair. Thus, a given genome {a, a} 
may be written as {<J V i a a rep , a rep a V i a }. As was done in 

li qI Ii il ,„„ „„„,™ 



'rep 7 ^rep 1 - 

|13l 1 1 ■ ll| , we may assume a single-fitness peak in both the 
viability and repair genes, so that there exist "master" se- 
quences (T viafi , a via fi, and a rePi o , &rep,o for both viability 
and repair, respectively. In the single-stranded formula- 
tion of the semiconservative model, a given a has a first- 
order growth rate k > 1 if a = <7 V i a ,a&rep or <Xrep<5ma,o- 
The growth rate constant is 1 otherwise. Furthermore, 
a has a functioning mismatch repair system with failure 
probability e r if a = a ma a rep ^, or a repfi a via . Otherwise, 
mismatch repair is inactivated. 

While we leave the solution of this two gene model for 
future work, we may nevertheless compute the location 
of the repair catastrophe. As with the case for conserva- 
tive replication, the repair catastrophe occurs when the 
effective growth rate constant of viable repairers drops 
below the growth rate of constant of viable non- repairers. 
For viable repairers, the effective growth rate constant is 
/c(2e~ £ ' /i / 2 — 1). We have for the non-repairers an ef- 
fective growth rate constant of viable organisms given by 
k(2e~( Lvia/L)tJ - /2 -l). The factor of L via /L arises because 
in dealing with the overall growth rate of the mutators, 
we are only concerned with the production of viable or- 
ganisms. The repairer gene does not need to be correctly 
replicated. 

The repair catastrophe then occurs when fc(2e -£rA1 / 2 — 
1) = k(2e- {L ^/ L ^/ 2 - 1), or when e r = L ma /L. 
Interestingly, this result is unchanged from the point- 
mutation, conservative result in |l3j |. or the full solution, 
conservative result in 0] . 



V. CONCLUSIONS 

This paper extended the quasispecies formalism to 
include the case of semiconservative replication, in or- 
der to allow for the more realistic modelling of the 
evolutionary dynamics of DNA-based life. While our 
model is currently most directly applicable to prokary- 
otic genomes, which generally consist of a single, circular 
DNA molecule, we believe that it forms the basis for 
future extension to genomes consisting of multiple chro- 
mosomes. 

After deriving the quasispecies equations for semicon- 
servative systems, we proceeded to solve them for the 
simplest landscape, that of the static single fitness peak. 
As with conservative replication, the solution of the single 
fitness peak yielded two regimes: A viable regime, where 
the population is localized about the "master" genome, 
and an unviable regime, where the population is delocal- 
ized over the genome space. The transition between the 
two regimes is known as the error catastrophe. 

The main difference between conservative and semi- 
conservative replication is that for conservative replica- 
tion, it is possible to push the error catastrophe to ar- 
bitrarily high replication error rates by increasing the 
growth rate constant of the master genome. In semicon- 
servative replication, on the other hand, the probability 
of correct replication must always be greater than 1/2, 
in order to avoid the error catastrophe. 

Semiconservative replication is therefore considerably 
less robust to the effect of mutagens than conserva- 
tive replication. Furthermore, the existence of a lower 
bound to semiconservative replication fidelity explains 
why above the error catastrophe, mutagenic agents kill 
more rapidly replicating cells faster than more slowly 
replicating cells. Thus, our model provides a mathemat- 
ical basis for explaining the efficacy of chemotherapeutic 
agents in treating cancers. 
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APPENDIX A: SOLUTION OF THE STATIC 
SINGLE FITNESS PEAK MODEL FOR 
SEMICONSERVATIVE REPLICATION 

1. Finite Genome Size Equations 

To begin, let us define the internal Hamming distance 
lj = HD((7q,<7q). Also, let erg, s denote the subsequence 
of bases where 00 and 00 are identical, and 00, ns and 
&o,ns denote the subsequences of bases in O and 00, 
respectively, where they differ. Then given some gene 
sequence 0, we can break it up into two subsequences 



9 

AGTGACTCATGA = a 

ATCTTAGAGAAG = a 

I I I I 
CTTCTCT AAGAT = ct 

<Vs = TTAA 

<Vns = ACTAGGAG 

5>,ns = CTCCTAGT 

<* S =GACG 

tf NS = ATGCTATA 

FIG. 7: Illustration of sequence decomposition into as and 
<7jvs components. 



0s and 07V5- as denotes the subsequence of bases in 
corresponding to the subsequence of bases where 00, 00 
are identical. 0^5 denotes the subsequence of remaining 
bases. This is illustrated in Figure 3. 

Given some gene sequence 0, we can then characterize 
it by the following numbers: (1) Is = HD(as,ao,s)- (2) 
Ins = HD(ctns,<to,ns)- (3) Ins = HD(aNS,&o,Ns)- 
(4) Ins = Ins + Ins ~h, so l^s is simply the number of 
positions where <7ns differs from both ao t NS and 0o,jvs- 

Now, any a 1 may be generated from any by making 
the appropriate base changes. We can make changes to 
as follows: 

1. Let l\^s denote the number of changes to 05 where 
as and 00, s are identical. There are { L ~i'~ s ) (25— 

possibilities for this set of changes. 

2. Let ?2,5 denote the number of changes to 05 back to 
the corresponding base in 00. s, where as and 00, s 
are distinct. The degeneracy in this case is ) • 

3. Let Z 3j s denote the number of changes to as to 
bases distinct from the corresponding bases in 00, s, 
where as and 0o,s are distinct. The degeneracy is 
(\£ s )(2S-2)'3.*. 

4. Let Iu,ns denote the number of changes to aws 
where 0jvs, 00, are identical, to bases other than 
the corresponding ones in Oj ns- The degeneracy 
is ( l l~ l ™)(2S-2) lll - ira . 

5. Let li2,NS denote the number of changes to aNS 
where djvs, ao,NS are identical, to the corre- 
sponding bases in <Jo,ns- The degeneracy is 

III— Ins— Iii,ns\ 

V ll2,NS I' 
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6. Let hi t NS denote the number of changes to uns 
where uns, <7o,ns are identical, to bases other than 
the corresponding ones in <Jo,ns- The degeneracy 
is (y i 7^/)(2£-2)'".™». 

7. Let 1\2,ns denote the number of changes to <r N s 
where (Jns, &o,ns are identical, to the corre- 
sponding bases in <to,ns- The degeneracy is 

(Ii—Ins— hi, ns\ 

^ ll2,NS ' 

8. Let h,NS denote the number of changes to <jns, 
where aNS is distinct from cto.jvs and ao,NS, and 
the bases are changed to the corresponding bases 

in (Jns.o- The degeneracy is (j^fj. 

9. Let 12,ns denote the number of changes to cr/vs, 
where (7ns is distinct from oo.jvs and cto.ns, and 
the bases are changed to the corresponding bases 

in a N s,o- The degeneracy is C"^" 8 )- 

10. Let Iz^ns denote the number of changes to cr N s, 
where <jns is distinct from <t ,ns and (Jq.ns, and 
the bases are changed to bases other than the corre- 
sponding ones in ctjvs,o and <7jvs,o- The degeneracy 

iS ^ JVS ~ i 2.JVS-(2,JVS^25' — 3)^3, NS 

The series of changes to a defined above yield a a' 



which is at a Hamming distance of + h,s + h,s + 

hl,NS + ll2,NS + hl,NS + ll2,NS + h,NS + h,NS + h,NS 

from a. Furthermore, the values of Is, Ins, Ins, and Ins 
for a' are given by, 

l's = Is + h,s - h,s 

I'ns = Ins + hi.NS + Ii2,ns — Ii2,ns — h,NS 
Ins = Ins + hi,NS + I12, ns — Ii2,ns — h,Ns 
I'ns — Ins + 'n.ivs + hi,NS — h,Ns — h,NS (Al) 

Now, we will assume that y a depends only on Is, Ins, 
and Ins- At time t = 0, we start the evolution by setting 
y a = for all a ^ a , a , and y ao = y Sa = 1/2 if er 7^ 00, 
and 1 if er = a . Therefore, y a = unless Is = and 
Ins — or Ins — 0. So certainly y a depends only on 
Is, Ins, and Ins at the start of the evolution. Also, by 
similar reasoning, we see that the fitness landscape {n a } 
also depends only on Is, Ins, Ins- If we can show that 
this implies that dy a /dt depends only on Is, Ins, Ins, 
then y (j depends only on Is, Ins, Ins throughout the 
evolution. 

So, consider some time t for which the y a depend only 
on Is, Ins, Ins, f° r all given a characterized by Is, Ins, 
Ins- Then we may write Vi s ,i NS j NS = Va- We also write 
K a = n ls i N j . Summing over the contributions from 
the a', obtained by the base changes described above, we 
obtain, 



d Vis, 



hvS,h 



dt 



L—li—ls Is ls—h,s Ii—Ins h—lNS—hi,NS Ii—Ins Ii— Ins— hi, ns Ins hvs—h,NslNS—h,NS—h,NS 

2 E E E E E E E E E E 

h,s=0 h,s=0 h,s=0 Ih,ns=0 ii2,ws=0 i 11JVS =o Ii2,ns=0 h,NS=0 i 2 ns= q h,NS=0 

L — li — ls\ ( ls\ (Is - h,s\ (h - Ins\ (h — Ins ~ hi,Ns\ (h - Ins^ 

h,s ) \h,sj \ h,s ) V hi,Ns ) \ 1\2,ns ) \ hi,NS 
h - Ins - hi,Ns\ ( Ins \ /Ins — h,Ns\ (Ins ~ h.Ns — h.Ns^ 

ll2,NS J \h,NS/ V h.NS J V h,NS 

(25 - l) ll - s (25 - 2) h - s (2S - 2) hl - NS (25 - 2) hl - NS (25 - 3) /3 ' NS x 

I \h,S+h,S+h,S+hl,NS+hl,NS+h2,NS+h2,NS+h,NS+l2,NS+h,NS v 

( 25~ V X 

^2 _ — h,s— h,s— h,s— hi, ns— hi, ns— '12, ns— h2,NS~h,NS — h,NS— h,NS ^ 



X 



ls+h,S—h,S>lNS+hl,NS+h2,NS—h2,NS—h,NS,lNS+hl,NS+h2,NS—h2,NS—l2,NS 
yis+h,S—l2,S,lNS+hl,NS+h2,NS—h2,NS—l2,NsjNS+hl,NS+h2,NS — h2,NS — h,NS 

-(^S-lNsjNS+^yis^NsjNS ( A2 ) 

Note from the sum that dy a /dt = dy a i /dt for any two a, a' characterized by the same Is, Ins, and Ins- Therefore, 
the assumption that y a is determined by Is, Ins, Ins is justified. 
We may sum over l^g and h,NS to obtain, 



d Vls,lNS,lNS 

dt 



L—h—ls Is Ii—Ins h— Ins— hi, ns Ii—Ins h— Ins— hi, ns Ins hvs~h,NS 

- 2 E E E E _ E E E E 

h,s=0 l2,s=0hi,NS=0 Ii2,ns=0 ; 11JVS= o i 12>J vs=0 h,NS=0 l 2NS =0 
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L — li — l,s\ ( Is \ fh — Ins\ f h — Ins — Iii.ns 



2.S 



11, NS 



h — Ins\ fh — Ins ~ I 



hl.NS 



U,NS 



hi. 



NS 



ll2,NS 

f Ins \ /Ins - h,NS 



NS 



h.NS 



e/2 



(25_2)'".^+'".^(_)Ji. S (. 2s | 



h,S+hl,N S+hl,NS+h2,NS+h2,N S + h,N S + h,N S y. 



(1- 



e / 2 _y s -h,s 



(1- 



lNS — l2,NS — h,NS 



25- r v 25- l 7 

^ — — ' S_ ^ NS ~ ^' s ~ hl,NS~ ill,WS- il2,WS— ll2,NS y, 

K ls+h,S—h,s4NS + hl,NS + h2,NS—h2,NS~h,NsjNS+lll,NS+h2,NS—h2,NS—h,NS 
yis+h,S—h,S,lNS+hl,NS+h2,NS — h2,NS — h,NsjNS + hl,NS + h2,NS~ll2,NS — h,NS 

,InsJns 



(A3) 



Now, the total number of sequences a characterized by the Hamming distances Is, Ins, and Ins is given by 

CisMsMs = ( L is ll )(D& 2S - 2 ) lNS - Then define ZisMsMs = C ls , lNS ,I NS y ls , lNS J NS - We may 

convert our differential equations from the y to the z representations. After some tedious algrebra, the final result is, 



dz 



Is ,Ins,Ins 

dt 



L—lj—ls Is fa— Ins fa— Ins— In, ns fa— Ins fa—^NS—fai,NS Ins Ins— ^2,ns 

2 E E E E _ E E E E 

h,s=0 h,s=0 iii,ws=0 (i2,jvs=0 ( 11JVS =0 i 12jJ vs=0 '2,ws=0 ; 2 JVS= 



'l,S + — ^2,S 



£ / 2 )^(1 



25-1 



e / 2 y S -i 2 .s 



25-1 



x 



L-h-ls- h,s + h,s\ ^y 2 S x 
/ 2 

'ws - h,NS - h,NS + hi,NS + hi,Ns\ {Ins - h,NS - h,NS + hi, 



hl,NS 

£ / 2 _yil,NS+hl,NS ^ 



hl,NS 



lNS—h,NS — l2,NS 



25 -r v 25 -V 

h - Ins ~ hi,NS — Ii2,ns + h,NS + Ii2,ns\ fh ~ Ins - hi.NS — Ii2,ns + h,NS 

ll2,NS J \ h,NS 

£ / 2 \ll2.NS+h.WS s/ 

25-l j X 

h - Ins - hi,NS — Ii2,ns + Ii2,ns + h,Ns\ fh — hvs - hi,NS ~ Ii2,ns + I12, ns 
h,NS J \ I12.NS 

£ / 2 y2.NS+ll2.NS ^25 2y 2 - NS ^ 2 - NS (1 — — fjVS — h,S~ hi, NS— hi, NS~ h2,NS~ ll2, NS ys 

25 1 2 



K ls+h,S~ h,S ,hw S + hl,N S + ll2,N S — ll2,NS — h,N S ,hv S+hl,N S+ll2,N S — ll2,NS — h,NS 
^ls+h,S~ l2,S,lNS+hl,NS+ll2,NS — ll2,NS— h,N S Jn S+hl,N S+ll2,N S ~ h2,NS~ h,NS 

( k Is,Ins,Ins + R ( t )) Z ls,lNS,lNS 



(A4) 



r 



2. The Infinite Sequence Length Equations 

We are now in a position to derive the infinite sequence 
length form of the quasispecies equations. We allow L — > 
00 while keeping /i = Le fixed. Furthermore, let us define 
fi = h/L, so fi is the fraction of bases in cr and <Jo which 
differ. If we let p(fi) denote the probability density for 
//, then in the limit of infinite sequence length we obtain 
that p(fi) — ► S(fi — (1 — jg)), where 5 is the Dirac 5- 



function. Therefore, we take // = 1 — in the L — > 00 
limit. 

A slight complication arises in the infinite sequence 
limit, namely, that h = fiL —> 00 as L — ► 00. This 
means that it is impossible for Ins an d Ins to simul- 
taneously be finite. For if l^g is finite, then l^s = 
h — hvs + hvs = 00 an d vice versa. The appropriate way 
to solve these equations is therefore to solve for finite val- 
ues of Is, Ins, and Ins- Then we can redenote z ls i NS i NS 
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by z ; j j- , and solve in the infinite sequence limit. 
The symmetry of the landscape allows us to obtain the 
finite Ins population fractions as well, since the popu- 
lation fraction for finite Is, Ins, and Ins is then simply 
given by z, T r . 

I 



In the following subsection, we show that as L — > oo, 
the only terms which survive the limiting process are the 
h,s = hi,NS = hi,NS = h,NS = li2,NS = terms. We 
also have, 



L-h -ls + h,s\^ey 2S 
h,s J 2 



(A5) 



h — Ins + h,NS + Ii2,ns 

ll2,NS 



\ fh - Ins + h,Ns\ ^ f/g y 12 , NS+h , NS 
J \ h,NS ) 25-1 



1 



fin 



(l \L— Is— Ins— h,s— In, ns— hi, ns—Ii2,ns— Ii2,ns 

V 2' 



h,N s ] -h2,N s ] - 2(25 - 1) 

,-M/2 



(A6) 



(A7) 



Using the fact that fj — > 1 — ^ as L — > oo, we obtain, after some manipulation (and after redenoting Ki s i NS j NS by 
K is 1ns i ns ), the infinite sequence length equations, 



dz, , j 

is, Ins, t 

dt 



Is Ins— Ins Ins 



2e 



I 01 01 ll^Nslh.Nsl^S 1 

n,s=0 (i,ivs=U h,NS=0 



NS+h,NS 



(25-2)' 



K ls—h,s-jNS—h,NS—h,NsjNS—h,NS Z ls—h,sdNS—h,NS—h,NsjNS—h,NS 



(k, , r + n(t))z, , j 

y is, Ins, Ins y >' is, Ins, h 



(A8) 



where we have redenoted l2,s by l\,s, and 1\2.ns by 1\,ns- 

It should be clear that zo,o,o = Vvo- Therefore, 2zo,o,o is the total fraction of the population with genome {oo, oo}. 
This gives, R(t) = 2(k — l)z o o + 1- 

I 



Now, as L — > oo, the sequences cr and cr become in- 
finitely separated. Therefore, we expect that the values 



quence length equations. Before proceeding, however, 
we derive some basic inequalities which we will need to 



of. 



IsJnsJns 



for finite Is, Ins, Ins to be dictated by the use. First of all, note that each 



single fitness peak at ctq. Thus, for large L, we expect 
to obtain a locally single fitness peak model in which 
we can then assume that y a depends only on the Ham- 
ming distance Is + Ins to <r . In the following subsection, 
we prove this rigorously. We may then group the pop- 
ulation into Hamming classes, as with the single fitness 
peak for conservative replication. Specifically, we define 

Wl = Y!i ns = T!i N n S s=0 Z 1 -Ins,Ins,Ins^ aild finall y ° btahl 
the infinite sequence length equations given by Eq. (17). 



, , r must be < 1. 

is, Ins, 'jvs — 

< k. We also have, 



Additional Calculational Details 



Furthermore, note that k 1 l t 

C"™")A m = 112=1 H 1 ^ < ((n+l)% m , and (1 - A)" < 1 
for A e [0,1]. 



We wish to show that in the limit of L — ► oo, the 
only terms which contribute to the dynamical equations 
are the h t s = hi,NS — hi,NS — h,NS = Ii2,ns = 
terms. We prove this by showing that for each of the 
above indices, the total contribution from all the nonzero 
terms becomes arbitrarily small as L — > oo with fi = Le 
held fixed. 



a. Derivation of the Infinite Sequence Length Equations 
from the Finite Sequence Length Equatiions 

In this appendix, we derive the infinite sequence length 



So, we start 
the inequalities 
that the summand 



with the h t s index, 
given above, we 
of Eq. (A4), 



From 
may note 
denoted by 



form for dz, 



s,Ins,Ins 



/dt from the corresponding finite sc- has the upper bound, 



Is ,1 N s,l N s,h,s ,h,S ,hl,N S ,hl,N S ,h2,N S ,h2,N S ,h,N S ,h,N s ' 



5 - 

Is ,In S,lNS,h,S ,h,S ,hl,N S ,hl,N S ,ll2,N S ,ll2,N S,h,N s ,h,t 
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< Ws + 1)(2^ I )) ,1 - S ((£ + 1)| )' 2 ' s ((^s + llt^W^^ffc + l)!^^)' 11 ^ x 



(A9) 

Now, at fixed (j,, choose L to be sufficiently large so that + = \{fi+e) < fi. Then certainly (Zj + 1)( 2 s_i ) < 2 s-i ■ 
We then have, 



L—lj—ls Is Ii—Ins Ii—Ins— 'ii.ns Ins— Ins Ins— Ins— hi, ns Ins Ins—12,ns 

E E E E E E E E 

!l,S = l (2,S=0 (ll,NS=0 (l2,JVS=0 i 11JVS= il2,NS=0 '2,JVS=0 i 2 ,JVS=0 

5 - - 

Is, In s, In s, h, s,h,s, hi, NS,h2,NS, hi, NS,h2,NS,h 



L-h-h_ 



<k x: ((^ + i)(^ T )) ii ' s E ^ 2,s E ((^ + i)(^ T ))' 1 ^ 



<<2,NS,l2,NS 

Ii—Ins 



e/2 



h,s=l 



h —In s 

E {(Ins-Ins + 1)( 

il2,NS = 



25-1 



2,S=0 ill,JVS=0 

icy In s— I ns in 

''- )) h ^ s E ((^s + i)(^ T )) ii1 ^ 



E ((^ + l)(: 

ill,NS=0 



Ins 



E (^t)' 12 '" S E ^ NS E ((^s-iivs + l)^))^ 

'l2,NS=0 '2,NS=0 



l 2,NS— 



(A10) 



Now, let Ai = EL=oM fc - Also, note that £" =m A" = 
^tt, for |A| < 1. Therefore, note that an upper bound 
for the product given above is simply, kAf +l s ({ls + 
l NS + l)(e/2))/(l - (i s + l NS + l)(e/2)) 5 . Therefore, as 
L — > oo, so that e — > in such a way that /i is fixed, 
we see that the contribution of the Zi,s > terms to the 
evolution dynamics approaches 0. Therefore, we need 
only consider the h t s = terms in the limit of infinite 
sequence length. Using a similar argument to the one 
given above, we can systematically eliminate the contri- 
butions from the hi,NsJi2,NS,hi,NS,h,NS > as well. 
This establishes the infinite sequence length form of our 
differential equations. We should note that convergence 
to the infinite sequence length form is not uniform, as can 
be seen by the Is + Ins dependence of our upper bound. 



b. Simplification of the Infinite Sequence Length Equations 



We wish to show that, as L — ► oo, we may assume that 
y ls 1ns i n becomes dependent only on Is + Ins, which 
will thereby allow us to considerably simplify the infinite 
sequence length equations (Eq. (A4)). 

To proceed with this simplification, let us first de- 
termine the effect that y ls 1ns j ns depending only on 
Is + Ins has on z, , r . We have, z, , j = 

J " a is, Ins, Ins ' <s,'ivs,<ivs 

C ls,lNS,lNS y is,lNS,lNS- BUt ' ^S.ijVS.ijVS = Vls^Nsfl = 

z Is,Ins,o/Ci s ,i ns ,o- Putting everything together, we ob- 



tain, 



a 



%,lNS,lNS={f N S s )( 2S -' 2 y NSZ ls,lNS,0 (AH) 



A similar procedure yields, 



z Is,Ins,0 



Is + Ins\ li\{L-h -Is-InsV- 
Ins J (h - Ins)KL - h - ls)\ * 
(2S-l)- l » s z ls+lNsfi ,o (A12) 



As L, lj — > oo, we get, 



{l - h - i s - i NS y. 



(Ii-InsV- {L-h-l s )\ 



(— lj -r- ) lNS = (-^V)' NS - (2S - l) lNS (A13) 
L — lj 1 — jj 

giving, z/ s ,i NS ,o = ( ls ^ s )zi s+ i Nsfit0 . Therefore, 

z Is,Ins,Ins = (^ S t l S NS ) (^j)( 2 ^- 2 ) /NS ^+^A0 

(A14) 

We wish to show that it is this relation which is pre- 
served by the evolution equations. Note that at time 



t = 0, we have 



'Is,Ins,Ins 



7}8i s+ i NS fi, so that this rela- 



tion holds at t = 0. If we can show that if this relation 

holds for all z, , j at some time t, then it holds for 

is ,1ns, Ins 

dz ls 1ns j Ns /dt, it follows that it holds throughout the 
evolution. 

We note also that k, , r only depends on Is + Ins 

* S y> N S j«JV S 

in the limit of infinite sequence length. Therefore, we 
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may define Ki s +i NS — k [s 1ns j , with kq — k, and k\ = notation from Is, Ins, Ins to p, q, r, we have, 
1 otherwise. So, suppose at some time t we have that Eq. 
(A14) holds for all Is, Ins, Ins- Then, after switching 



dz 
dt 



t £ ± ^ - 2 >'«— C + 9 ;t ~ " ~ ') (* ; . 

j=0 k = 1=0 J r J / \ 

(2S - 2) r ~ Z Z p+ q_j_ fe _ ; ,o,0 - (Kp,q,r + «(*)) ^ ^ (2S - 2) r Z p+qfi ,0 

\ p ) W ~ ^ m\ ^ mK P+i- mZ P+i 11 f x 

\ F / \ / m=0 



E ( P M q 7)(^- ( W + «(*)) (' ! 9 )f !) (2* - 2) Wo 



P+<j\ \iJ\ k J\lJ l ^P'^^^'n p 

m J j+k+l=m,(j,k,l)e[0,p]x[0,q-r]x[0,r] V / ^ ' ^ ' ^ 

P + 9 V^(25-2r^±M£ (A15) 



The last two lines are derived by noting that the prod- 
uct of the factorials, ( p+q - J S j h ~ l ) (^tj 1 ) , is equal 

to - (j+fc+i)! ^(^IP ' th ° n by n ° ting ^ 

E(p\ (q- r \ ( r \ — (p+i\ 
j + k+l=m,(j,k,l)£[0,p]x[0,q-r]x[0,r] \jj V fc Wi/ V m / ' 

This relation can be derived by expanding (x + 
in two different ways: First by direct expansion using 
the binomial theorem, and second by expanding (x+ l) p , 
(x+ l) 9 ~ r , (a;+ l) r separately, and then taking the prod- 
uct. Matching powers of x yields the relation given above. 

Note that that we have shown that z p i9ir = 
( P T)(r)( 2S - 2)%+ g ,o,o for all p,q,r throughout the 
evolution. Then given some I, let us collect all the 
population at Hamming distance I from cr by defin- 
ing wi = Em=o YJ?=o z l-m,m,r- We then have, Wi = 

*l,o,o ELo Er=o (™) (?) (25 - 2f = (2S) l z w . There- 
fore, using the expression for dzifi$/dt, we immediately 
obtain the infinite sequence length equations given by Eq. 
(17). 

APPENDIX B: NOTES ON THE 
IMPLEMENTATION OF THE STOCHASTIC 
SIMULATIONS 

To allow for independent verification of the semiconser- 
vative error catastrophe, we develop a stochastic frame- 
work that directly simulates the population dynamics of 
evolving organisms. These simulations are stochastic in 
that replication and error events occur with some prob- 
ability at each time step in the simulation. 

Although the stochastic system we develop mirrors the 



system described in the semiconservative quasispecies 
equations, we do not include any a priori information 
from these equations in our simulations. 

The stochastic system consists of a population of TV 
organisms. Each organism contains a genome with two 
strands: a and its complement d. Each genome se- 
quence is associated with a probability of replication 
PR,{a,a} that describes the probability that an individ- 
ual with that genome will replicate at each time step. 
If a genome is chosen to replicate, the strands separate 
into two daughter organisms which proceed to replicate 
the opposing strand. Each base on that strand is cor- 
rectly replicated with a probability of 1 — e. After com- 
plimentary strand synthesis, lesions in the genome of the 
daughter organisms are repaired, and in the absence of 
parent-strand-specific repair mechanisms the "correct" 
base-pairing is retained in the daugher with probability 
1/2. 

In order to maintain a relatively constant population 
size in our system, we impose the constraint that N is 
less than or equal to some N max . To meet this constraint, 
after a round of replication proceeds through the popu- 
lation, if N > N max , individuals are removed from the 
population until TV = N max . Each individual in the pop- 
ulation has equal probability (1/N) of being removed at 
this step. 

Although we employ a single-gene, single-fitness-peak 
model for the purposes of this work, the stochastic frame- 
work described here is very general and may be employed 
to explore the dynamics of populations in much more 
complicated systems. 
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